Learning under Non-Stationarity: Covariate Shift and Class-Balance Change

نویسندگان

  • Masashi Sugiyama
  • Makoto Yamada
چکیده

One of the fundamental assumptions behind many supervised machine learning algorithms is that training and test data follow the same probability distribution. However, this important assumption is often violated in practice, for example, because of an unavoidable sample selection bias or non-stationarity of the environment. Due to violation of the assumption, standard machine learning methods suffer a significant estimation bias. In this article, we consider two scenarios of such distribution change — the covariate shift where input distributions differ and classbalance change where class-prior probabilities vary in classification — and review semi-supervised adaptation techniques based on importance weighting.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Continuous Target Shift Adaptation in Supervised Learning

Supervised learning in machine learning concerns inferring an underlying relation between covariate x and target y based on training covariate-target data. It is traditionally assumed that training data and test data, on which the generalization performance of a learning algorithm is measured, follow the same probability distribution. However, this standard assumption is often violated in many ...

متن کامل

Semi-supervised speaker identification under covariate shift

In this paper, we propose a novel semi-supervised speaker identification method that can alleviate the influence of non-stationarity such as session dependent variation, the recording environment change, and physical conditions/emotions. We assume that the voice quality variants follow the covariate shift model, where only the voice feature distribution changes in the training and test phases. ...

متن کامل

Covariate Shift Adaptation by Importance Weighted Cross Validation

A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisfied, for example, when the outside of the training region is extrapolated. The situation where the training input points and test input points follow different distr...

متن کامل

Computationally Efficient Class-Prior Estimation under Class Balance Change Using Energy Distance

In many real-world classification problems, the class balance often changes between training and test datasets, due to sample selection bias or the non-stationarity of the environment. Naive classifier training under such changes of class balance systematically yields a biased solution. It is known that such a systematic bias can be corrected by weighted training according to the test class bal...

متن کامل

Learning under Non-stationarity: Covariate Shift Adaptation by Importance Weighting

The goal of supervised learning is to estimate an underlying input-output function from its input-output training samples so that output values for unseen test input points can be predicted. A common assumption in supervised learning is that the training input points follow the same probability distribution as the test input points. However, this assumption is not satisfied, for example, when o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013